Non-distributive Aggregate Functions Cs 764 : Advanced Database Project Report

نویسندگان

  • Li Fan
  • Rushan Chen
چکیده

The ability to eeciently compute multiple related group-bys is critical to On-Line Analytical Processing and multidimensional data analysis. The computation of CUBE, a special case of the aggregation problem, has been well studied 1]. However, to the best of our knowledge, previous work focused primarily on computing aggregates over distributive and algebraic functions 3]. In this project, we investigated the possibility of computing "Holistic" aggregate functions eeciently. We started with a straight forward, non-optimized approach and experimented alternatives in various aspects, from retaining more information to engineering for less accurate results. We concluded with evidence that, while hardly any optimization can be made in generating the exact result of holistic functions on the CUBE, an approximation method which we named bucket cube can achieve an accuracy of 85%-99% while incurring only moderately more overhead than computing the CUBE over common distributive functions.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Incremental Maintenance for Non-Distributive Aggregate Functions

Incremental view maintenance is a well-known topic that has been addressed in the literature as well as implemented in database products. Yet, incremental refresh has been studied in depth only for a subset of the aggregate functions. In this paper we propose a general in-cremental maintenance mechanism that applies to all aggregate functions, including those that are not distributive over all ...

متن کامل

Optimization for Queries with Holistic Functions

The early grouping technique is a new method for optimizing aggregate queries. It provides more opportunities for the query optimizers to find optimal plans because all possible placements of the GROUP BY operators in the query trees are considered during the optimization process. Howeve1; to employ this technique, one of the requirements is that the aggregate function in the query must be dist...

متن کامل

Automatic Research Summaries in DBLife

The Cimple project on Community Information Management [4] is a project with the goal of developing a software platform for the effective management of data related to a given online community. The DBLife project [3] is a prototype system to intended to help test and extend the ideas of the Cimple project, focused specifically on the database research community. In DBLife, each researcher in th...

متن کامل

Profiling the Resource Usage of OLTP Database Queries

This technical report contains eight final project reports contributed by ten participants in “Hot Topics in Database Systems,” a CMU advanced graduate course offered by Professor Anastassia Ailamaki in Fall 2002. The course covers advanced research issues in modern database system design through paper presentations and discussion. In Fall 2002, topics included query optimization, data stream a...

متن کامل

The Camelot Project

Camelot provides flexible and high performance transaction management, disk management, and recovery mechanisms that are useful for implementing a wide class of abstract data types, including large databases. To ensure that Camelot is accessible outside of the Carnegie Mellon environment, Camelot runs on the Unix-compatible Mach operating system and uses the standard Arpanet IP communication pr...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007